TopHat: discovering splice junctions with RNA-Seq
نویسندگان
چکیده
MOTIVATION A new protocol for sequencing the messenger RNA in a cell, known as RNA-Seq, generates millions of short sequence fragments in a single run. These fragments, or 'reads', can be used to measure levels of gene expression and to identify novel splice variants of genes. However, current software for aligning RNA-Seq data to a genome relies on known splice junctions and cannot identify novel ones. TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites. RESULTS We mapped the RNA-Seq reads from a recent mammalian RNA-Seq experiment and recovered more than 72% of the splice junctions reported by the annotation-based software from that study, along with nearly 20,000 previously unreported junctions. The TopHat pipeline is much faster than previous systems, mapping nearly 2.2 million reads per CPU hour, which is sufficient to process an entire RNA-Seq experiment in less than a day on a standard desktop computer. We describe several challenges unique to ab initio splice site discovery from RNA-Seq reads that will require further algorithm development. AVAILABILITY TopHat is free, open-source software available from http://tophat.cbcb.umd.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
منابع مشابه
PASSion: a pattern growth algorithm-based pipeline for splice junction detection in paired-end RNA-Seq data
MOTIVATION RNA-seq is a powerful technology for the study of transcriptome profiles that uses deep-sequencing technologies. Moreover, it may be used for cellular phenotyping and help establishing the etiology of diseases characterized by abnormal splicing patterns. In RNA-Seq, the exact nature of splicing events is buried in the reads that span exon-exon boundaries. The accurate and efficient m...
متن کاملMapSplice: Accurate mapping of RNA-seq reads for splice junction discovery
The accurate mapping of reads that span splice junctions is a critical component of all analytic techniques that work with RNA-seq data. We introduce a second generation splice detection algorithm, MapSplice, whose focus is high sensitivity and specificity in the detection of splices as well as CPU and memory efficiency. MapSplice can be applied to both short (<75 bp) and long reads (≥ 75 bp). ...
متن کاملUnbiased comparison of alignment tools for splice junction detection from RNA-Seq data
RNA-Seq makes it possible not only to measure gene expression but also to identify and quantify transcript isoforms in different experimental conditions. Although a large number of tools allow to infer isoform expression levels from RNA-Seq data, the quantification of alternative splicing variants remains challenging. Aligning reads that span (possibly un-annotated) exon-exon junctions in an ef...
متن کاملUnSplicer: mapping spliced RNA-seq reads in compact genomes and filtering noisy splicing
Accurate mapping of spliced RNA-Seq reads to genomic DNA has been known as a challenging problem. Despite significant efforts invested in developing efficient algorithms, with the human genome as a primary focus, the best solution is still not known. A recently introduced tool, TrueSight, has demonstrated better performance compared with earlier developed algorithms such as TopHat and MapSplice...
متن کاملHMMSplicer: A Tool for Efficient and Sensitive Discovery of Known and Novel Splice Junctions in RNA-Seq Data
BACKGROUND High-throughput sequencing of an organism's transcriptome, or RNA-Seq, is a valuable and versatile new strategy for capturing snapshots of gene expression. However, transcriptome sequencing creates a new class of alignment problem: mapping short reads that span exon-exon junctions back to the reference genome, especially in the case where a splice junction is previously unknown. ME...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 25 شماره
صفحات -
تاریخ انتشار 2009